Lecture 21: Introduction to Time Series Data in ‘R’
Author
Neva Morgan
Published
April 21, 2025
Objective:
In this activity, you will download streamflow data from the Cache la Poudre River (USGS site 06752260) and analyze it using a few time series methods.
Setting Up:
library(zoo)
Warning: package 'zoo' was built under R version 4.4.3
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(timeSeries)
Warning: package 'timeSeries' was built under R version 4.4.3
Loading required package: timeDate
Attaching package: 'timeSeries'
The following object is masked from 'package:zoo':
time<-
The following objects are masked from 'package:graphics':
lines, points
# For some reason the ts package wouldn't download due to it being out of date or either my RStudio is out of date?library(xts)
Warning: package 'xts' was built under R version 4.4.3
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'tidyr' was built under R version 4.4.3
Warning: package 'readr' was built under R version 4.4.3
Warning: package 'purrr' was built under R version 4.4.3
Warning: package 'dplyr' was built under R version 4.4.3
Warning: package 'lubridate' was built under R version 4.4.3
Warning: package 'tsibble' was built under R version 4.4.3
Registered S3 method overwritten by 'tsibble':
method from
as_tibble.grouped_df dplyr
Attaching package: 'tsibble'
The following object is masked from 'package:lubridate':
interval
The following object is masked from 'package:zoo':
index
The following objects are masked from 'package:base':
intersect, setdiff, union
library(feasts)
Warning: package 'feasts' was built under R version 4.4.3
Loading required package: fabletools
Warning: package 'fabletools' was built under R version 4.4.3
Attaching package: 'fabletools'
The following object is masked from 'package:yardstick':
accuracy
The following object is masked from 'package:parsnip':
null_model
The following objects are masked from 'package:infer':
generate, hypothesize
library(dplyr)
First, use this code to download the data from the USGS site.
library(dataRetrieval)
Warning: package 'dataRetrieval' was built under R version 4.4.3
# Example: Cache la Poudre River at Mouth (USGS site 06752260)poudre_flow <-readNWISdv(siteNumber ="06752260", # Download data from USGS for site 06752260parameterCd ="00060", # Parameter code 00060 = discharge in cfs)startDate ="2013-01-01", # Set the start dateendDate ="2023-12-31") |># Set the end daterenameNWISColumns() |># Rename columns to standard names (e.g., "Flow","Date")mutate(Date =yearmonth(Date)) |># Convert daily Date values into a year-month format (e.g., "2023 Jan")group_by(Date) |># Group the data by the new monthly Datesummarise(Flow =mean(Flow)) # Calculate the average daily flow for each month
Plot variable not specified, automatically selected `.vars = Flow`
ggplotly(pf_plot)
3. Subseries
Use gg_subseries to visualize the seasonal patterns in the data. This will help you identify any trends or seasonal cycles in the streamflow data.
# Load 'feasts' package before running:gg_subseries(pf_tbl) +labs(title ="Monthly Poudre River Flow Patterns",x ="Date",y ="Flow",subtitle ="ESS330 A21 | Neva Morgan") +theme_minimal()
Plot variable not specified, automatically selected `y = Flow`
Describe what you see in the plot. How are “seasons” defined in this plot? What do you think the “subseries” represent?
After plotting using the gg_subseries, the monthly flow rate for the Poudre River, appears to be at a higher level during May and June months, with an occasional increase of flow during April or September.
Seasons within this plot are defined by the months that are correlated with similar flow rate measurements to one another, the larger increae of flow could represent the end of spring moving into summer months (April - September).
From what we’ve learned from class, “subseries” are represented by the different years within the months of the data, showing how flow has changed from each month with multiple years being compared to one another.
4. Decompose
Use the model(STL(…)) pattern to decompose the time series data into its components: trend, seasonality, and residuals. Chose a window that you feel is most appropriate to this data…
# Making the Decomposition modelpf_decomp <- pf_tbl |>model(STL(Flow ~season(window ="periodic"))) |>components()# Visualizing with autoplotautoplot(pf_decomp) +labs(title ="STL Decomposition of Poudre River Flow", y ="Flow") +theme_minimal()
ggpubr::ggdensity(pf_decomp$remainder, main ="Residual Component")
shapiro.test(pf_decomp$remainder)
Shapiro-Wilk normality test
data: pf_decomp$remainder
W = 0.84083, p-value = 1.25e-10
Describe what you see in the plot. How do the components change over time? What do you think the trend and seasonal components represent?
After running a few extra tests to understand how the flow of the Poudre River has changed from 2013 to 2023, The window that showed the most alarming change for me at least was the Residual window. From what I can understand of the data presented, the flow has a pretty annual transition, peaking around May and June, and having a stagnant section in the months between. The flow has changed with time, it’s highest recorded in the later months of 2015, indicating a heavier rainfall or water accumulation in the river for that year. But it also shows dips in the data where the flow was in a negative rate, these could indicate a drier or dorughted year in terms of precipiation.
From what I understand from class and my other courses, the trend component is measuring the average of flow as it spans over the years, showing fluctuation from years prior as compared to the decrease we have now. The seasonal component shows the consistent peak that occurs from March to May as water is changing in it’s physical state and moving down through the watershed into the Poudre River.
Submission:
Upload a rendered qmd file to the course website. Make sure to include your code and any plots you created.
This should be an HTML file with self-contained: true.
It should not point to a local host, and must be the physical file.
Make sure to include your code and any plots you created and that the outputs render as you expect.